Various Approaches to Web Information Processing

نویسندگان

  • Kristína Machova
  • Peter Bednár
  • Marian Mach
چکیده

The paper focuses on the field of automatic extraction of information from texts and text document categorisation including pre-processing of text documents, which can be found on the Internet. In the frame of the presented work, we have devoted our attention to the following issues related to text categorisation: increasing the precision of categorisation algorithm results with the aid of a boosting method; searching a minimum number of decision trees, which enables the improvement of the categorisation; the influence of unlabeled data with predicted categories on categorisation precision; shortening click streams needed to access a given web document; and generation of key words related with a web document. The paper presents also results of experiments, which were carried out using the 20 News Groups and Reuters-21578 collections of documents and a collection of documents from an Internet portal of the Markiza broadcasting company.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

بهینه‌سازی اجرا و پاسخ صفحات وب در فضای ابری با روش‌های پیش‌پردازش، مطالعه موردی سامانه‌های وارنیش و انجینکس

The response speed of Web pages is one of the necessities of information technology. In recent years, renowned companies such as Google and computer scientists focused on speeding up the web. Achievements such as Google Pagespeed, Nginx and varnish are the result of these researches. In Customer to Customer(C2C) business systems, such as chat systems, and in Business to Customer(B2C) systems, s...

متن کامل

Load Balancing Approaches for Web Servers: A Survey of Recent Trends

Numerous works has been done for load balancing of web servers in grid environment. Reason behinds popularity of grid environment is to allow accessing distributed resources which are located at remote locations. For effective utilization, load must be balanced among all resources. Importance of load balancing is discussed by distinguishing the system between without load balancing and with loa...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

طبقه‎بندی کاربردی کارکردهای عوامل نرم‎افزاری هوشمند و تطبیق آنها با ویژگی‎های وب‎سایت‎های کتابخانه‎های دیجیتال

Purpose: Web services are presently considered as technologies with highest number of applications for the purpose of providing the automatic, high-quality, and fast information interactions. The aim of this paper is therefore to provide a comprehensive framework for a collection of significant services offered by Farsi websites in libraries to be used in future designs. It also aims to classif...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computing and Informatics

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2007